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Abstract — The problem of linear modulation classification 
using likelihood based methods is considered. Asymptotic prop- 
erties of most commonly used classifiers in the literature are 
derived. These classifiers are based on hybrid likelihood ratio test 
(HLRT) and average likelihood ratio test (ALRT) respectively. 
Both a single-sensor setting and a multi-sensor setting that 
uses a distributed decision fusion approach are analyzed. For 
a modulation classification system using a single sensor, it is 
shown that HLRT achieves asymptotically vanishing probability 
of error (Pe) whereas the same result cannot be proven for 
ALRT. In a multi-sensor setting using soft decision fusion, 
conditions are derived under which Pe vanishes asymptotically. 
Furthermore, the asymptotic analysis of the fusion rule that 
assumes independent sensor decisions is carried out. 

Index Terms — Automatic modulation classification, maximum 
likelihood classifier, decision fusion. 



I. Introduction 

Automatic modulation classification (AMC) is a signal 
processing technique that is used to estimate the modulation 
scheme corresponding to a received noisy communication 
signal. It plays a crucial role in various civilian and military 
applications, e.g., this technique has been widely used in many 
communication applications such as spectrum monitoring and 
adaptive demodulation. The AMC methods can be divided into 
two general classes (see the survey paper ||T|): 1) likelihood- 
based (LB) and 2) feature-based (FB) methods. In this paper, 
we focus on the former method which is based on the 
likelihood function of the received signal under each mod- 
ulation scheme, where the decision is made using a Bayesian 
hypothesis testing framework. The solution obtained by the LB 
method is optimal in the Bayesian sense, i.e., it minimizes the 
probability of incorrect classification. In the last two decades, 
extensive research has been conducted on AMC methods, 
which are mainly limited to methods based on receptions at 
a single sensor (communication receiver). A detailed survey 
on the AMC techniques using a single sensor can be found in 
im. For a single sensor tasked with AMC, the classification 
performance depends highly on the channel quality which 
directly affects the received signal strength. In non-cooperative 
communication environments, additional challenges exist that 
further complicate the problem. These challenges stem from 
unknown parameters such as signal-to-noise ratio (SNR) and 
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phase offset. In order to alleviate classification performance 
degradation in non-cooperative environments, network centric 
collaborative AMC approaches have been proposed in (T\, (3l, 
[4 1, 1 5 1, |6|. It has been shown that the use of multiple sensors 
has the potential of boosting effective SNR, thereby improving 
the probability of correct classification. 

In this paper, we focus on the likelihood based classification 
of linearly modulated signals, i.e., PSK and QAM signals. 
We notice that this problem is a composite hypothesis testing 
problem due to unknown signal parameters, i.e., uncertainty 
in the parameters of the probability density functions (pdfs) 
associated with different hypotheses. Various likelihood ratio 
based automatic modulation classification techniques have 
been proposed in the literature. An underlying assumption in 
all of these techniques is that each hypothesis has equally 
likely priors, in which case the classifiers reduce to maximum 
likelihood (ML) classifiers. These techniques take the form 
of a generalized likelihood ratio test (GLRT), an average 
likelihood ratio test (ALRT) or a hybrid likelihood ratio test 
(HLRT). A thorough review of these techniques can be found 
in Q. In the GLRT approach, all the unknown parameters 
are estimated using maximum likelihood (ML) methods and 
then a likelihood ratio test (LRT) is carried out by plugging 
in these estimates into the pdfs under both hypotheses. In 
addition to its complexity, GLRT has been shown to provide 
poor performance in classifying nested constellation schemes 
such as QAM |8|. In the ALRT approach |7|, the unknown 
signal parameters are marginalized out assuming certain priors 
converting the problem into a simple hypothesis testing prob- 
lem. In the HLRT approach |7|, the likelihood function (LF) 
is marginalized over the unknown constellation symbols and 
then the resulting average likelihood function (LF) is used to 
find the ML estimates of the remaining unknown parameters. 
These estimates are then plugged into the average LFs to 
carry out the LRT. Also, there are several variations of HLRT, 
which are called quasi HLRT (QHLRT), in which the ML 
estimates are replaced with other alternatives such as moment 
based estimators. We do not discuss the details here and refer 
the interested reader to |7| for further details. Our goal in 
this paper is to derive asymptotic (in the number of observa- 
tions N) properties of modulation classification methods. We 
consider both single sensor and multiple sensor approaches. 
Although there has been extensive work on developing various 
methods for modulation classification, to the best of our 
knowledge, except for the work in |9|, there is no work in the 
literature that investigates asymptotic properties of modulation 
classification systems under single sensor or multi-sensor 
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settings. In f9l, the authors consider a coherent scenario where 
the only unknown variables are the constellation symbols. In 
this scenario, they analyze the asymptotic behavior of ML 
classifiers for linear modualtion schemes. Using Kolmogorov- 
Smirnov (K-S) distance, they show that the ML classification 
error probability vanishes as — s- cxd. Our contributions in 
this paper are as follows. We start with a single sensor system 
and analyze the asymptotic properties of two AMC scenarios: 

1) coherent scenario with known signal-to-noise ratio (SNR), 

2) non-coherent scenario with unknown SNR. Although the 
first scenario is the same as the one considered in |9|, we 
provide a much simpler proof which is then utilized to obtain 
the results for our second scenario. We analyze both HLRT 
and ALRT approaches. We do not consider GLRT due to its 
poor performance in classifying nested constellations. After 
analyzing single sensor approaches, we consider a multi-sensor 
setting as shown in Fig. [1] Under this framework, we analyze 
a specific multi-sensor approach, namely distributed decision 
fusion for multi-hypothesis modulation classification where 
each sensor uses the LB approach to make its local decision. In 
this setting, there are L sensors observing the same unknown 
signal. Each sensor employs its own LB classifier and sends 
it soft decision to a fusion center where a global decision 
is made. We analyze the asymptotic properties of ALRT and 
HLRT in this multi-sensor setting in the asymptotic region as 
N oo and L ^ oo. We also provide implications of large 
number of observations for the fusion rule at the fusion center 

The rest of the paper is organized as follows. In Section Ull 
we introduce the system model and lay out our assumptions. 
In Section Hill we formulate the likelihood-based modulation 
classification problem and summarize HLRT and ALRT ap- 
proaches. We consider the single sensor case in Section |IV] 
and analyze the asymptotic probability of classification error 
under various settings. Similarly, the asymptotic probability 
of classification error in the multi-sensor case is analyzed in 
Section |V] We provide numerical results that corroborate our 
analyses in Section [Vl] Finally, concluding remarks along with 
avenues for future work are provided in Section IVIII 



Sensor Node 1 
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Fig. 1. Generic system model for a multi-sensor modulation classification 
system, si is the decision/data of the Ith sensor, where I = I, . . . , L. 



II. System Model Assumptions 

We consider a general linear modulation reception scenario 
with multiple receiving sensors assuming that the wireless 
communication channel between the unknown transmitter and 
each sensor undergoes flat block fading, i.e., the channel 
impulse response is h{t) — ae^^d{t) over the observation 
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interval. After preprocessing, the received complex baseband 
signal at each sensor can be expressed as |[T]: 

r{t) = s{t\u) + v{t), 0<t<NT (1) 
s{t\u) = ae^-^e^^.A/t ^ j^^^^^^^ _ _ ^y)^ 

where s{t) denotes the time-varying message signal; u repre- 
sents the unknown signal parameter vector; a and 6 are the 
channel gain (or the signal amplitude) and the channel (or 
the signal) phase, respectively; v{t) is the additive zero-mean 
white Gaussian noise; gtx{t) is the transmitted pulse; T is the 
symbol period; {/„} is the complex information sequence, i.e., 
the constellation symbol sequence; and e and A/ represent 
residual time and frequency offsets, respectively. The constant 
eT represents the propagation time delay within a symbol 
period where e G [0, 1). Throughout the paper, we assume 
that e and A/ are perfectly known. Therefore, without loss of 
generality, we set e — A/ = 0. The representation in ^ has 
the implicit assumption that phase jitter is negligible. Without 
loss of generality, we further assume that the constellation 
symbols have unit power, i.e., i?[|/„p] = 1, where E[-] 
denotes statistical expectation. Note that the unknown phase 
term denoted by 6* in (|2]) subsumes both the unknown channel 
phase and unknown carrier phase. Similarly, the unknown 
signal amplitude a subsumes the unknown signal amplitude 
as well as the unknown channel gain. 

After filtering the received signal with a pulse-matched filter 
grx{t), and sampling at a rate of Q/T, where Q is an integer, 
the following discrete-time obervation sequence is obtained 
01: 

rfe = Sfc(u) + Wfe (3) 
Sk{u) = ae^" J2 Ing{kT/Q - nT), (4) 

n=0 

where g{t) — gtx(t) * grx{t) with * denoting the convolution 
operator, rfc rit)*grx{t)\t=kT/Q, Wk = v{t)*grx{t)\t=kT/Q, 
N is the total number of observed information symbol, and 
k = 0, . . . , K - 1. Note that N = K/Q, i.e., there are Q 
samples per symbol. For simplicity, we assume that gtx{t) is 
a rectangular pulse where g{t) = 1, < t < T. We further 
assume Q = 1 and w„ is independent identically distributed 
(i.i.d.) circularly symmetric complex Gaussian noise with real 
and imaginary parts of variance No/2, i.e., Wn ~ CAf{0, Nq). 
Our analysis in this paper can be easily generalized to other 
pulse shapes and cases where Q > 1. Under these assump- 
tions, the received observation sequence can be written as: 

rn = ae^^In + Wn, n = 0, . . . , TV - 1. (5) 

The above signal model is a commonly used model in mod- 
ulation classification Hterature [D, El, Ull, El- Note that 
a, 9, and {In}n=i the unknown signal parameters. In a 
general modulation classification scenario, in addition to the 
unknown signal parameters, the noise power A^o may also be 
unknown. In this case, the unknown parameter vector can be 
written as u = [a, 0, Nq, {In}n=o]- 
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III. Likelihood-based Linear Modulation 
Classification 

Our goal throughout this paper is to gain insights into 
the modulation classification problem using the assumptions 
commonly made in the modulation classification literature. 
Suppose there are S candidate modulation formats under 
consideration. Let r denote the observation vector defined as 
r [tq, . . . , rN^i] and In ^ denote the constellation symbol 
at time n corresponding to modulation i E {1, . . . , S}. The 
conditional pdf of r conditioned on the unknown modulation 
format i and the unknown parameter vector u, i.e., the 
likelihood function (LF), is given by 

If the transmitted signal is an M-PSK signal, the constellation 
symbol set is given as Sf/ = {e^^^rm/Af |^ = 0, . . . , - 1} 
and In'' G Sp^ . Otherwise, if the transmitted signal is 
an M-QAM signal, the constellation symbol set is Sq = 

{6™eJ«- |m = 0, . . . , M - 1} and /^'^ e 5*111 

Note that the LF in (|6]l is parameterized by the modulation 
scheme under consideration and the only difference between 
conditional pdfs of different modulation schemes comes from 
the constellation symbols /„. In a Bayesian setting, the optimal 
classifier in terms of minimum probability of classification 
error is the maximum a posteriori (MAP) classifier. If there is 
no a priori information on probability of modulation scheme 
employed by the transmitter available, which is usually the 
case in a noncooperative environment, one can use a non- 
informative prior, i.e., each modulation scheme is assigned an 
identical prior probability. This is the assumed scenario in this 
paper In this case, the optimal classifier takes the form of the 
maximum likelihood (ML) classifier. 

Let us first consider the HLRT approach, where the LF 
is averaged over the unknown constellation symbols /„ and 
then maximized over the remaining unknown parameters. The 
modulation scheme that maximizes the resulting LF is selected 
as the final decision, i.e., 

I = arg max ( max E ii){pi{r\u)}, ] (7) 

i=l,...,S ya,8,No " / 

where E.j;[-] denotes the expectation operator with respect to 
the random variable x, and In ^ is the unknown constellation 
symbol for modulation format i. 

In the ALRT approach, the unknown parameters are all 
marginalized out resulting in the marginal likelihood function 
which is used to make the final decision as 

i = arg max Eu{pi{r\u)}. (8) 

i—l,....S 

In the next section, we analyze the probability of classification 
error starting with a single sensor setting followed by a multi- 
sensor setting. 

'in certain cases, these sets can be rotated by some fixed phase, e.g., QPSK 
is represented as a rotated version of Sp by e^'^/*. This does not affect our 
results. 



IV. Asymptotic Probability of Error Analysis: 
Single Sensor Case 

A. Scenario 1: Coherent Reception with Known SNR 

In this scenario, the only unknown variables are the data 
symbols /„, n — 1,...,N. In this case, without loss of 
generality, the received complex signal can be expressed as 

rn=In+Wn, n=l,...,Af, (9) 

Assuming independent information symbols and white sensor 
noise, the LF averaged over the unknown constellation sym- 
bols under modulation format i is given as 

N 

p,(r) :=p(r|H,) = Hpir^m, (10) 

n=l 

where 

Mi 

= ^p(r„|C'«,i?,MC'^'^l^f.)- (11) 

m— 1 

In dnil, Mi and /„ ' are the number of constellation symbols 
and the m*'* constellation symbol for modulation class i, 
respectively. In general, the constellation symbols are assumed 
to have equal a priori probabilities, i.e., ~ l/M^, 

which results in 

p{rnm^ — Y,p{rn\i::^'^'^,H,). (12) 

m— 1 

where 

v{r.\IT'^H.) = ^exp ("^kn -C'«p) (B) 

In this case, p(r„|i/j) in (fT2l l represents a complex Gaus- 
sian mixture model (GMM), or a complex Gaussian mixture 
distribution, with Mi homoscedastic components where each 
component has identical occurrence probability (weight) 1 /Mi 
as well as identical variance iVo, and the mean of each 
component is one of the unique constellation symbols in 
modulation format i. Let us revisit the generic expression for 
a complex GMM denoted by /(r): 

M 

f{r) ^^w,<j>{r;fi„af) (14) 

i=l 

where 

(j){r;fii,af) = — j ^^P 2 (1^) 

^c^t \ erf J 

We know that a GMM given by ( fT4b and ( fTsT i is completely 
parameterized by the set {wi, fj,i,ai}f£i 1141 . 

Remark 1: For a given modulation format i, the Gaussian 
mixture model (GMM) in ( fT2b is completely parameterized 
by the means of the components in the mixture, i.e., by the 
constellation symbol set S^^ = . . , /^•'(*)}. In other 

words, if 5*^*' ^ 5^^^ then p(r„|i?i) and p{rn\Hj) represent 
two different GMMs. 
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Let US now define the test statistics 



N 



Ai := --^logK(r) ^ -^J2logp{rn\H,) 



Then, the ML classifier is given as 



i — arg min 

2— 1,...,S 



(16) 



(17) 



The classifier performance can be quantified in terms of the 
average probability of error (Pg) given as 



4=1 



(18) 



where P* is the probability of error under hypothesis Hi, i.e., 
given that modulation i is the true modulation. 



P; = 1-P(A, < A,|i/,), VjV*- 



(19) 



Now, we can state the following theorem which shows that the 
probability of error of the ML classifier vanishes asymptoti- 
cally as — > oo. Note that the same result was also obtained 
in ||9) using Kolmogorov-Smirnov (K-S) distance. Here, we 
provide a simpler proof than the one in ||9l. 

Theorem 1: The ML classifier in (fTTT i asymptotically attains 
zero probability of error for classifying digital amplitude-phase 
modulations regardless of the received SNR, i.e.. 



lim Pe 



0. 



(20) 



Proof: Suppose Hi is the true hypothesis. In order to 
study the asymptotic [N — s> oo) behavior of Aj(r) under 
Hi, we follow the same technique as in Iil5i and write the 
following using the law of large numbers: 

lim A,(r) = ~£;,[logp,(r)] (21) 

= E,[\og{p,{r)lp,{r))] - E,[\ogp^{r)] (22) 
^D{p,\\pj) + h,{r) (23) 

where £'4'] is the expectation under Hi, D{pi\\pj) is the 
Kullback-Leibler (KL) distance between pi and pj defined as 
D{pi\\pj) := Ei\[og{pi(r)/pj{r))], and hi(r) is the differen- 
tial entropy defined as hi{r) —Ei^ogpi{r)] lfT6l . Note that 
hi{r) is not a function of any modulation j ^ i. Therefore, 
under Hi, the only difference between test statistics A,; and Aj 
is the KL distance D{pi\\pj) > 0, which is equal to zero if 
and only if pj = pi . Now, let us revisit the ML classification 
rule given in (fTTT i. 



J 



arg min lim Aj(r). 

j — l,...,S N—>oo 



(24) 



Since the second term in ( |23T l is independent of the test 
statistics under consideration, i.e., Aj, the only difference 
between different test statistics results from the the first term 
in (|23] |. which is the KL distance D{pi\\pj). If D{pi\\pj) > 
for j i and D{pi\\pj) — for j — i, the ML classifier in 
(|24] | will always decide 



j = arg min lim Aj(r). 

j — l,...,S N—¥oo 



(25) 



Therefore, ( |25l l implies that perfect classification is obtained 
for any given SNR in the limit as A^ cxo if and only if 



D{pi\\pj) > 0, Vj, «, j 7^ i. For digital phase-amplitude 
modulations, we know from (fT2l l that Pi{r) represents a GMM 
and each modulation format corresponds to a unique GMM 
(see Remark[T]i. Therefore, D{pi\\pj) > 0, Vj, i, j ^ i, which 
is the only condition needed for asymptotically vanishing error 
probability of the ML classifier. ■ 



B. Noncoherent Reception with Unknown SNR 

In this scenario, the received complex signal is expressed 



r„ = ae' In + Wn, n= 1, 



(26) 



In this case, in addition to the unknown constellation symbols, 
there are three more unknown parameters which are channel 
amplitude (a), channel phase (6), and noise power (A^o)- We 
will denote these additional unknown parameters in vector 
form as u = [a,NQ,9], where a G [0, oo), A'o G [0, oo) and 
e e [0,27r). 

Let us first consider the HLRT approach, where the un- 
known data symbols are marginalized out and the remaining 
unknown parameters are estimated using an ML estimator In 
HLRT, these ML estimates are plugged into the likelihood 
function to perform the ML classification task. In practice, 
the complex channel gain ae^^ can be either random or 
deterministic depending on the application. In deep-space 
communications, the channel gain can be assumed to be 
a deterministic time-independent constant [17J, whereas in 
urban wireless communications, the channel gain is often 
assumed to be random due to multipath effects resulting 
in fading. In fading channels, the duration over which the 
channel gain remains constant depends on the coherence time 
of the channel. Nevertheless, in HLRT, the channel gain is 
always treated as a deterministic unknown regardless of the 
application and ML estimation is employed to estimate a and 
9. The resulting likelihood function for modulation i can be 
written as 

JV 

Pr{r,Ui) ■.= Pi{r\H,,u,) = J| p(r„|iJj, u^), (27) 



where 



p{rn\Hi,Ui 



ri=l 



M. 

— ^p(r„|i/„u„C'«), (28) 



N 



Ui = arg max TT p{rn\Hi, u). 

u 



(29) 



In order to be explicit, we re-write 



p{rn\Hi,Ui) 



1 

Ml 



Mi 



1 



nN, 



■ exp 



0,1 



Afr 



O.i 



(30) 

From ( l30l l, we can see that p(r„|i/j, u^) represents a complex 
GMM with Mi homoscedastic components where each com- 
ponent has identical occurrence probability \/Mi as well as 
identical variance Ao,i, and the mean of each component is 
one of the unique constellation symbols in modulation format 
i mutiplied by aiC^^'. 



IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS (DRAFT) 



5 



We can define the new test statistics which now includes 
the estimates of the unknown parameters as 



N 



n=l 



Ai(r,Ui) := --i:logp,(r|ui) = ^ logp(r„|i/,, u,). 

(31) 

(32) 



Then ( |29] l can be equivalently written as 

Ui = argmin Ai(r, u), 

u 

and the ML classifier is given as 



arg min Ai(r, Ui 



(33) 



We start the analysis by making the following observations. 
In practice, there is always some a priori knowledge on the 
bounds of the unknown parameters a and Nq. In other words, 
the search space for the maximization of the likelihood func- 
tion with respect to a and A'o can be confined to [0, A^] and 
[0, N^], respectively, for some known and . Regarding 
the unknown phase 6, the search space depends on the modula- 
tion class that is under consideration. For M-PSK modulations, 
it suffices to limit the search space of 9 to [0, 27r/A/), because 
the likelihood function is a periodic function of 6 with a 
period of 2tt/M. This is due to averaging over the unknown 
constellation symbols and rotation of the constellation map 
with respect to 6, i.e., rotation of the constellation map by 
2tt/M results in the same constellation map as far as the 
likelihood function averaged over the constellation symbols is 
considered. Similarly, for M-QAM modulations, it suffices to 
limit the search space of 9 to [0, 7r/2) because of the same rea- 
sons as M-PSK modulations discussed earlier. We now make 
the following assumption which will simplify mathematical 
analysis. We assume that the unknown parameters [a, Nq, 9] lie 
in the interior region of the cube [0, A'^] x [0, N^] x [0, 2tt/M] 
for M-PSK or [0,A^] x [0,iV^] x [0,7r/2] for M-QAM, 
respectively. Note that these assumptions are almost always 
satisfied in practice. Let us denote this closed Euclidean space 
as U : [0,A^] x [0,iV'^] x [0,6''^], where 9'^ = 2t:/M for 
M-PSK and 6*^ = 7r/2 for M-QAM. 

Lemma 1: Let S denote the set of PSK and QAM modu- 
lation classes. Define pi{r\vLi) := p{r\Hi,Ui). Let i,j G S, 
Ui G Ui, Uj G Uj. If i ^ j, then 

Dip,ir\u,)\\p,ir\u,)) > 0. (34) 

Proof: See Appendix lAl ■ 
The following theorem states that the probability of error 
of the HLRT classifier vanishes asymptotically as N oo. 

Theorem 2: The ML classifier in ( [33] l asymptotically attains 
zero probability of error for classifying digital amplitude-phase 
modulations regardless of the received SNR. 

Proof: Suppose Hi is the true hypothesis and u* denotes 
the true value of the unknown parameter. We start by noting 
that the maximum likelihood estimator (MLE) is consistent 
under some mild regularity conditions fT8|, which are sat- 
isfied by the likelihood functions of digital amplitude-phase 
modulations. In other words, if Hi is the true hypothesis and 
u* is the true value of the unknown parameter u, then 



Under Hi, we write the following using the law of large 
numbers 



N 



lim Aj(r, Uj 



-E,[\ogPj{r\uj)], 



(36) 



where Ei[-] denotes expectation with respect to p{r\Hi,u*). 
Then, (l36l l can be written as 

lim A,(r, u,) = i?41og(K(r|u*)M(r|u,))]- (37) 

N^oo 

E,[\ogpM<)] 
= D(pi(r|u*)|b,(r|u,)) + h^O (38) 

where the second term is the differential entropy of the true 
distribution defined as hi{r\u*) :— —Ei^ogp{r\Hi,u*)]. The 
proof follows from Lemma [T] and the same reasoning as in 
Theorem [T] ■ 
From ( [38] l, we can make the following observation. Under Hi 
and the true parameter u*, 

Uo = argmin lim Ao (r, u) (39) 

u N—>oo 

= argmin D{pi{r\u*)\\pj{r\u)). (40) 

As iV — !■ cx), the MLE Uj minimizes the KL distance between 
the true and the assumed distributions. This was actually 
observed by Akaike |19| in the area of maximum likelihood 
estimation under misspecified models (see also [i20|). We 
should also emphasize that the consistency of the ML estima- 
tor is necessary for to vanish as -> cxd as otherwise one 
cannot deduce ( [38] ) from ( [37] i. As one would expect, the result 
in Theorem|2]is useful in practice only when the channel gain 
remains constant over a large observation interval. Channels 
that exhibit such a behavior include deep space communication 
channels as well as slowly varying fading channels. 

Next, we consider a variation of the HLRT approach where, 
in addition to unknown data symbols, a subset of remaining 
unknown parameters are marginalized out. Then the maximiza- 
tion is carried over the remaining subset. Let u° denote the 
subset of the unknown parameters that are marginalized out 
and /uo(u'^) denote the joint a priori distribution of u*^. Let 
denote the vector of the remaining unknown parameters 
over which the maximization is carried out. Then, the ML 
classifier is given as 



i = arg max pi(r|u,^) 

i=l,...,S 

u] = argmaxpi(r|u^) 



(41) 
(42) 



where 



K(r|ui)=/ K(r|ui,uO)/uo(u")duO. (43) 

Since the unknowns [a,NQ,6] stay constant over the obser- 
vation interval, it is clear from ( |57] l that the observations r„ 
become dependent after averaging (conditional independence 
is no longer valid), i.e.. 



N 



(44) 



argmin lim Ai(r, u). 

u AT— i-oo 



(35) 



Due to this dependence, the law of large number cannot be 
invoked. Therefore, these classifiers do not have provably 
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vanishing in the asymptotic regime as cxd. This is 

also the case for the ALRT approach where all the unknowns 
are marginalized out before classification. In practice, ALRT 
may be preferred over HLRT since the latter requires multi- 
dimensional maximization of the LF which is generally a 
non-convex optimization problem. In order to alleviate this 
problem, a suboptimal HLRT called quasi-HLRT (or QHLRT) 
was proposed in fSl, fT2l, where the MLEs of the unknown 
parameters were replaced with moment based estimators. In 
general, QHLRT does not guarantee provably asymptotically 
vanishing P^, since these estimators are generally not consis- 
tent. 

V. Asymptotic Probability of Error Analysis: 
Multi-Sensor Case 

In this section, we consider a multi-sensor setting where 
each sensor transmits its soft decision to a fusion center where 
a global decision is made. We start our analyses assuming soft 
decision fusion where each sensor sends its unquantized local 
likelihood value to the fusion center. 

In a multiple sensor scenario, the set of unknown parameters 
{a, 0, A'o} corresponding to each sensor is independent from 
that of other sensors. However, care must be taken to analyze 
this scenario as the independence of these unknowns does not 
guarantee the independence of different sensor observations. 
In the following, we will investigate the multiple sensor 
scenario and derive conditions under which the asymptotic 
error probability goes to zero. 

A. Scenario 1: Coherent Reception with Known SNR 

We first consider the general case for the coherent and 
synchronous environment where there are L sensors and 
each sensor I {I = 1,. . . ,L) makes N observations. Let us 
define the vector of observations for each sensor as r; := 
[rn,^ , • • • , ' = 1, • • • , i-- We also define the set of 

indices for the complex information sequence that each sensor 
observes as 

Ii:={ni„...,ni^}, l^l,...,L. (45) 
Similar to (fT0li- (fT2] |. the likelihood function at sensor / is 

p.iri) ■.= p{vi\Hi) = H p{r^\H,), (46) 

neli 

p{r.^m - ^ E P{rn\IT^'\H,). (47) 

* m— 1 

Let Pi{Ys) and Pi{rt) denote two arbitrary likelihood functions 
for sensor s and t, where s ^t. Assuming independent sensor 
noises, it is important to see that rs ~ Pi(rs) and Yt ~ Pi(rt) 
are independent if and only if 

I, nXt = 0. (48) 

The condition in ( |48] | is required for independence since data 
symbols are marginalized out in the likelihood function. We 
should note that the implicit assumption in (l48l l is that the 
data symbols are i.i.d. in time which is a common assumption 
in communications literature. From ( |48] |, we can deduce the 



general condition for independence. All sensor observations 
are independent (across sensors) if and only if 

fl Ti^^- (49) 
;=i,...,L 

Physically, the condition in ( l49l l implies that sensor observa- 
tions, or the underlying baseband symbol sequences, should 
not overlap in time to satisfy independence. This condition 
may or may not be realized in practice. One possible way 
of obtaining independent sensor observations is to send a 
pilot signal to each sensor initiating data collection and leave 
enough time between two consecutive pilot signals so that each 
sensor observes a different non-overlapping time window of 
the same signal. 

Suppose the condition in ( |49b is satisfied. Let p^ denote 
the likelihood function at the fusion center for modulation i 
defined as 

L 

p1:=p{v,,...,VL\H{)=\l\{p{rr,\H,). (50) 

1=1 neli 

We can now define 

1 1 ^ 

'LN^°^P^ = --^logMrd^.) (51) 

1=1 

1=1 riGl, 

Note that the independence condition is necessary in order for 
the second equality in ( fSTl ) to hold. Then, the ML classifier is 
given as 

i = are; min A? (52) 
1=1, ...,s 

Theorem 3: As J2f=i ^ '^^e ML classifier in 

(|52] | achieves zero probability of error for classifying digital 
amplitude-phase modulations regardless of the received SNRs 
at sensors. 

Proof: The proof follows the same steps as in Theorem 
[T] and is omitted here for brevity. ■ 

B. Noncoherent Reception with Unknown SNR 

In this scenario, the received complex signal at sensor I can 
be expressed as 

r„, = aie^^'Ini + m„, , ni el;. (53) 

The vector of unknowns for sensor Hs u; = [a/, 9i, iVg,]. Let 
us first consider the HLRT approach where sensor I computes 
its likelihood by first marginalizing over the unknown symbols 
Ini, ni G I;, and then plugging in the MLE of u;. Let us 
define the vector of observations at the fusion center as ro := 
[ri, . . . , r^]. Suppose that the independence condition in ( l49l ) 
is satisfied. Let pf{ro) denote the likelihood function at the 
fusion center for the HLRT given as 

L 

pf(ro) :=p(ro|i?,,Ui,...,UL) = Jl ]J p(r„, u,). 

1=1 Ml ell 

(54) 

^1 ■ I is the cai'dinality operator. 
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Following the same reasoning as in the single sensor scenario, 
we can claim that as N ^ oo using Theorem [T] 

However, the same result cannot be claimed for finite N even 
when L ^ oo due to different unknown parameters at different 
sensors. 

If a subset of unknowns are marginalized out in the HLRT 
approach (see Section HV-Bl eqs. (I4ni-(l44ll). the distribution at 
the fusion center takes the following form: 

L 

p{ro\H„ . . . , fli (^)) = Y[p{ri\H„ u^j^)), (55) 

1=1 

where u^ .^ denotes the ML estimate of the remaining un- 
known parameters of sensor I under Hi, i.e., 

U;\j) = argmaxpi(ri|u^) (56) 

where 

p,{vi\n^)^ [ K(ri|u\u°)/uo(uO)du°. (57) 

Then, the ML classifier is given as 

i = arg^jnax^p(ro|iJ^,u}^(,),...,Ui^(,)) (58) 

Similar to (l44l i. since the unknowns [a;, A^Oi , ^i], ' = 1, ■ • • 7 i 
stay constant over the observation interval, it is clear from (|57] | 
that the observations r„, become dependent after averaging, 
i.e., 

p^{vi\v}) ^ n ft('^">')- (59) 

Therefore, these classifiers do not have provably vanishing 
in the asymptotic regime as — > 00 due to dependence or 
as i — s> CX3 due to different unknown parameters at different 
sensors. 

Let us now consider the ALRT approach where all the 
unknowns are marginalized out. Denote the joint a priori 
distribution of u; as/u(u). Let p^(ro) denote the likelihood 
function at the fusion center for ALRT defined as 

L 

pfi^a) ■=\{pa{vi\H,) (60) 

where 

PA{ri\H,)^ I p{ri\H,,u)fu{u)du. (61) 
Now, define the following 

1 1 ^ 

Af := ^-logptivi) = --J2^ogpA{rim. (62) 

1=1 

The ML classifier is given as 

i — arg min Af. (63) 

4=1, 

For ALRT, we consider a special case where Nq is knowiJl, 
a is Rayleigh distributed with £'[a^] — F, and 6 is uniformly 
distributed over [— tt, tt], i.e., 6 ~ U[—tt, tt]. From fT|, we can 

^When there is no non-stationary interference in the environment. Nq 
corresponds to stationary sensor background noise power which can be 
accurately estimated using offline techniques. 



write the conditional pdf at sensor I as in (|64] | shown at the 
top of the page, where C is a normalizing constant which is 
identical for all modulation classes. Note that the expectation 
in ( |64] | requires summation over combinations 
of constellation sequences which may be computationally 
prohibitive for large N. Alternatively, (l64l i can be computed 
by changing the order of averaging operations, i.e., by first 
averaging over the unknown constellation symbols followed 
by averaging over the unknown channel phase and the channel 
amplitude. This alternative approach does not result in a closed 
form expression, therefore, it needs to be computed by using 
numerical techniques. 

Lemma 2: Let S denote the set of PSK and QAM modula- 
tion classes. Define pf{Yi) := p^(ri|iJi) as given in (|64] |. For 
i,j eS,ifi^j and iV > 1, then D{pf{ri)\\pf{ri)) > 0. 
Proof: See Appendix iBl ■ 

Theorem 4: Suppose A^o is known, a is Rayleigh dis- 
tributed, and 6 is uniformly distributed over [— 7r,7r]. Then 
the ML classifier in ( |63] ) achieves zero probabiUty of error as 
L 00. 

Proof: The proof follows from Lemma |2] and the same 
method as in Theorem [ij ■ 
Theorem|4]ensures that asymptotically vanishing Pg is guaran- 
teed in the number of sensors if ALRT is used at each sensor 
provided that each sensor has independent observations, i.e., 
each sensor observes a non-overlapping time window of the 
transmitted signal. In other words, using a multi-sensor ap- 
proach ensures asymptotically vanishing P^ for ALRT which 
is not provably the case for a single sensor as explained in 
Section HV^ 

C. Fusion Rule 

In this section, we analyze the implications of the indepen- 
dence condition in ( |49l ) for decision fusion based modulation 
classification. For finite number of observations (N < 00), it is 
clear that if ( l49b is not satisfied, there are sensors observing the 
same baseband sequence resulting in dependent observations 
due to averaging over unknown constellation symbols. If (l49l ) 
is not satisfied, even though each sensor noise is independent, 
the joint conditional distribution at the fusion center cannot 
be written as a product of individual conditional distributions, 
i.e., 

L 

p,{ri,...,rL) ^Y[p^{rl). (65) 

1=1 

However, in the asymptotic regime as iV — ^ 00, we have the 
following theorem. 

Theorem 5: Suppose there are two groups of L sensors 
denoted as Q and Q' observing the same signal with unknown 
modulation. Suppose the sensors in Q have arbitrary overlaps 
in their observations and the sensors in Q' have no overlaps. 
Let r/ and rj, I = 1, . . . ,L denote the observations from the 
sensors in Q and Q', respectively. Let Pi{vi) (pi(rj)) denote the 
likelihood function of sensor I (I') under Hi which represents 
either a coherent scenario with known SNR as in ( l46l l or a 
noncoherent scenario with unknown SNR in the forms of HLR 
or ALR as in dZTl i or ( fSTT i or ( |6TI ). Suppose both groups use 
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r 



■ exp 



r 



\l(i)H 



^1 



No 



(64) 



the same fusion rule to classify the unknown modulation given 
as: 

L 

Gi : i = argmax J|pj(ri), (66) 



1=1 

L 



g[: i = argmax J|pj(r;). 



(67) 



1=1 



Let Pe and denote the probabilities of classification error 
for the fusion rules in (|66] l and ( |67] i. respectively. As — cx), 
we have the following result: 



lim (Fe - Pe) = 



(68) 



Proof: Sensor observations in Q are dependent. This de- 
pendence results solely from overlapping sensor observations 
regardless of the scenario under consideration and regardless 
of which classification algorithm is employed (HLR or ALR). 
Suppose Hi is the hypothesis under consideration. Let A4i 
denote the set of constellation symbols for modulation i with 
= Mi\ and /„, n — 1,...,A^ denote the received 
constellation symbol sequence by an arbitrary sensor. Suppose 
Sm G M^i and let l^(i)(/„) denote the indicator function 

defined as l^(i)(-^n) = 1 if /„ = s':^} or l^(i)(/„) = 
otherwise. Now, define 



N 



( In 



(69) 



which represents the number of occurences of s,V in the 
received symbol sequence {/i, . . . , /at}. Now, take the limit 

lim E^,., [1^(0 (/„)] (70) 

where (a) results from applying the law of large numbers and 
(&) results from the fact that each symbol in the constellation 
set A4i is equally likely. We can rewrite (iTOl l as 

lim n{s^) = ^, (71) 



which implies that as iV — > cx), each constellation symbol 
sin S Mi has identical number of occurences Therefore, 
in the asymptotic regime (N — s> oo), each sensor observes 
equal number of different constellation symbols whether those 
symbols overlap across sensors or not. 

Now, consider sensor I and let denote the n-th symbol 
received by sensor I. Note that Pi{ri) = Y[k=iPii^ik) is 
permutation invariant with respect to r; = [r;^, . . . , r;„] (or 
{/(j, ... ,/;„}), because each Ii^ is i.i.d. and background 
noise is white. In other words, Pi{ri) is invariant to the 
order of the received symbol sequence {7;^ }. Let 

us define a virtual sensor indexed by /' and suppose that it 
observes a symbol sequence {/;/^ , ■ • ■ , Ii'^ } that does not over- 
lap with those observed by other sensors, i.e., {7;^ , . . . , /;„ } 



and {/;/^ , • ■ • , Ii'^ } represents i.i.d. symbol sequences. As we 
let ^ oo, the number of occurences of each symbol in 
{/(i , . . . , /;„ } and {Ii'^ become identical from (|7B- 

This implies that {/;/^ , . • . , Ii'^ } becomes a re-ordered version 
of {/jj , . . . , /jj^ }. In this case (as N — > oo), the elements 
of the observation vector r; can be re-ordered to form a new 
observation vector r;/ such that it represents noisy observations 
of the virtual symbol sequence {/j'^ , . • . , Ii'^ }■ It follows that, 
since Pi{ri) is permutation invariant with respect to r;, we 
have the following equality as ^ oo: 



Pi{ri) = pi(r;/ 



(72) 



Similarly, we can follow the same argument as above and 
show that Pi{ri) = Pi{rii), I — 1, . . . , L. This implies that as 

iV ^ oo, 

L L 

Y[p,{ri) =l[p,{ri,). (73) 

(=1 (=1 

Finally, the above equality implies that as — !• oo. 

L L 

arg max J| (r; ) = arg max J| (rj ) , (74) 

* 1=1 ' 1=1 

which concludes the proof. ■ 
The above result shows that as A^ — oo, we can always 
re-arrange the order of original observations and create an 
equivalent system with independent observations resulting in a 
new system having the same classification performance as the 
original one provided that both systems use the same fusion 
rule. 

Remark 2: We know that the optimal fusion rule for Q' 
which minimizes is given as i = argmaxj J^j^-^ pi(rj). 
The practical implication of Theorem |5] is that, for large A^, 
regardless of any overlap in the sensor observations, the fusion 
rule i = argmaxi J|^j^pi(r;) will achieve the performance 
which is the best that can be achieved by a multi-sensor system 
with independent sensor observations. Practical A^ values for 
which this performance can be achieved will be provided 
by numerical results in Section [VT] for different modulation 
classification scenarios. 

In practice, it may be impossible to characterize the depen- 
dence in sensor observations as sensors may have arbitrary 
and unknown overlaps in their observations. In this case, the 
optimal fusion rule simply cannot be derived and the fusion 
rule that assumes independence becomes a natural choice. 
Theorem |5] provides an asymptotic performance guarantee for 
such a scenario. 



VI. Numerical Results 

In this section, we provide numerical results that corroborate 
our analyses in Sections |IV] and [V] First, we consider the 
single sensor case and investigate two classification scenarios: 



IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS (DRAFT) 



9 



1) binary classification of BPSK versus (vs.) QPSK, 2) 3- 
ary classification of 16-PSK vs.l6-QAM vs. 32-QAM. Figures 
|2] and [3] show versus number of observations (N) under 
two different SNR regimes. The results are obtained using 
2000 Monte Carlo simulations. The difference between the 
two figures is that the former assumes a coherent scenario 
with known SNR whereas the latter assumes a noncoherent 
scenario with unknown SNR for which HLRT is used as the 
classifier. It is clear from both the figures that decreases 
monotonically as N increases under both SNR regimes which 
support the analyses of Theorems [T] and |2] As expected, the 
rate of decrease in Pe is slower under dB SNR than that 
under 6 dB SNR. Since Theorem[3]is an extension of Theorem 
[T|to a multi-sensor case, we do not provide additional results 
for that particular scenario. 

Fig. |4] demonstrates the performance of ALRT for classifi- 
cation of BPSK vs. QPSK with respect to number of sensors 
(L) under two different SNR regimes. Each sensor receives a 
Rayleigh faded signal with an average channel SNR defined 
as E[a'^]/No = T/Nq. The number of observations per sensor 
is set to iV = 4 . Similar to the previous cases, 2000 Monte 
Carlo simulations are used to obtain the results. As stated by 
Theorem m and shown in Fig. ID Pe decreases monotonically 
as L gets larger regardles of the SNR regime. Furthermore, 
the rate of decrease in Pg is slower for smaller SNR values 
as expected. 

Finally, Figures |5] and |6] illustrate how the fusion rule 
that assumes independent sensor decisions behaves asymp- 
totically under dB SNR for two different classification 
scenarios: 1) binary classification of 16-PSK vs.l6-QAM, 

2) 3-ary classification of 64-QAM vs. 128-QAM vs. 256- 
QAM, respectively. Both figures assume coherent scenarios 
with known SNRs. In the figures, "Independent Observations" 
refers to the case where the condition in ( l49l l is satisfied, i.e., 
each sensor oberves a non-overlapping window of the signal, 
whereas "Dependent Observations" is the case where each 
sensor oberves the same window, i.e., there is complete over- 
lap between sensor observations. Results are obtained using 
lO'' Monte Carlo simulations. In Fig. |5] each marked point 
represents L x N = 1000 observations and those points cor- 
respond to iV = {1,2,5,10,20,50,100,250,500} resulting 
in L = {1000,500,200,100,50,20,10,4,2}. When sensor 
observations are independent, P^ is identical for all the points 
where L x N is constant. This is shown in both figures under 
"Independent Observations" case. It is clear from Fig.|5]that as 
N grows, the performance of both systems converge support- 
ing the analysis in Theorem |5] For this particular scenario, 
when N — 250 and L — 4, the classification performance 
of the system with dependent observations is almost identical 
to that with independent observations where both fusion rules 
assume independent observations. In Fig.|6] each marked point 
represents L x N = 3000 observations and those points cor- 
respond to iV = {10,20,50,100,250,500,750,1000,1500} 
resulting in L = {300,150,60,30,12,6,4,3,2}. For this 
scenario, when N = 1000 and L = 3, the classification 
performance of the system with dependent observations is 
almost identical to that with independent observations. We 
note that the convergence of the former scenario in Fig. |5] 



is faster than the latter in Fig. |6] This is due to the difference 
between cardinalities of constellation sets under considera- 
tion. Modulations with larger constellation sets requke more 
number of observations for the mixing in (fTOl i to take place. 
Therefore, practical N values for which the two systems that 
use the same fusion rule behave identical is dependent on the 
classification scenario under consideration. 

VII. Conclusion 

In this paper, we have investigated asymptotic behavior 
of LB modulation classification systems under two different 
scenarios: 1) coherent reception with known SNR, and 2) 
noncoherent reception with unknown SNR. Both a single- 
sensor setting and a multi-sensor setting that uses a distributed 
decision fusion approach are analyzed. In a single-sensor 
setting, it has been shown that P^ vanishes asymptotically 
in the number of observations (N) under coherent reception 
with known SNR. Under noncoherent reception with unknown 
SNR, HLRT achieves perfect classification, i.e., P^ oo, in 
the asymptotic regime as — > oo, whereas this is not provably 
the case for ALRT. This property of HLRT is due to consis- 
tency of the ML estimator as well as statistical independence 
of data symbols in time. In a multi-sensor setting, under the 
assumption of independent sensor observations, it has been 
shown that perfect classification is achieved, i.e., Pe — > cxd, 
in the asymptotic regime as the number of sensors L — > cx) 
provided that each sensor employs ALRT regardless of the 
number of observations (N). However, this is not provably 
the case when each sensor employs HLRT using a finite 
number of samples (N < oo). Finally, the asymptotic analysis 
of the fusion rule that assumes independent sensor observa- 
tions is carried out. It has been shown that this fusion rule 
asymptotically achieves the same performance as the best that 
can be achieved by a system employing independent sensor 
observations. The asymptotic results derived in this paper have 
practical implications in that they provide design guidelines 
as to which LB classification method should be selected for 
the specific scenario under consideration. Furthermore, they 
provide theoretical asymptotic performance guarantees for 
practical systems, which, otherwise, would be unknown. 

As a future work, it would be interesting to investigate the 
case where each sensor makes hard decisions, i.e., quantized 
likelihoods are sent to the fusion center, instead of soft 
decisions (analog likelihoods) as assumed in this paper, and 
the fusion center employs hard decision fusion for modula- 
tion classification. We can conjecture that, under independent 
identical quantizer assumptions, one would obtain similar 
asymptotic results as for the soft decision fusion analyzed in 
this paper Nevertheless, a rigorous treatment would be useful. 
Furthermore, we would like to incorporate additional unknown 
signal parameters such as frequency and time offsets into the 
signal model for similar asymptotic analyses in the future. 

Appendix A 
Proof of Lemma[T] 

It is sufficient to show that if i ^ j, then p{r\Hi,Ui) and 
p{r\Hj, Uj) are not identical distributions for any u^, Uj. We 
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butions and, hence, D{pi{r\ui)\\pj{r\uj)) > 0. 
ii) Case-2: Modulations i and j represent two modulation 
classes with the same number of constellation symbols. In 
this case, one of the modulation classes is M-PSK and the 
other is M-QAM. Suppose modulations i and j represent 
M-PSK and M-QAM, respectively. In this case, the mean 
value of each component in the GMM is given by 
^^^,i,n) e S'^' - {a,eJ"(2™/A/+eO|^ = 0, . . . , A/ - 1} 
and (,„) e S'q = {aj6„eJ(''"+''j)|m = 0, . . . , Af - 
1}. We know from M-QAM constellation symbol set that 
there exist mi and m2 such that b„n ^ b„i^. In order 
for pi{r\ui) and pj{r\uj) to be identical, the following 
condition should be satisfied: 



Fig. 2. Coherent scenario with known SNR. Pg versus number of observa- 
tions (N) under two different SNR regimes: dB and 6 dB. 
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j{2iTm/M+ei) 



ajbri 



0,.. 



,M-1. 
(76) 



Now suppose pi{r\ui) and pj{r\uj) are identical and con- 
sider mi and m2 such that 7^ 6m2- Then, from ( l76b . 
we can write a,^eJ'(2^™i/^^+''-) = aj6„ieJ(^"n+»j), which 
implies that ai/aj — b^i- Since pi{r\ui) and pj{r\uj) are 
identical, we can also write from (|76l) that a.eJ(2i-m2/J\/+e,) ^ 
aj6TO2e-''^^'"2+^i) implying that at/aj — bm^^ which is a con- 
tradiction, because b„i^ ^ b^^. Then, pi{r\vLi) and pj(r|uj) 
must be different GMMs, therefore, D{pi{r\ui)\\pj{r\vLj)) > 
0. ■ 



iog,„N 



Appendix B 
Proof of Lemma|2] 



Fig. 3. Noncoherent scenario with unknown SNR. Pe versus number of 
observations (N) under two different SNR regimes: dB and 6 dB. 



We drop the sensor index I for simpUcity of the presentation. 
There are three cases to be considered: 



note from dSOl l that each p{r\Hi,Ui) is a complex GMM 
with Mi components where each component has the same 
occurrence probability 1/Mi, i.e.. 



Mi 



1 



m— 1 

Mi 



(75) 



If the transmitted signal is an M-PSK signal, then g 
Sp^. Otherwise, if the transmitted signal is an M-QAM signal, 
then G Sq . From ( fTsT l. the mean value of each 

component in the GMM corresponds to a unique constellation 
symbol (in the constellation map of modulation format i) 
scaled by and rotated by 9i. The variance of each component 
is No^i. For different modulation classes i and j, there are two 
cases to be considered: 

i) Case-1: Modulations i and j represent two modulation 
classes with different number of constellation symbols. 
In this case, pi{r\ui) and pj{r\uj) represent two GMMs 
with different number of components, i.e.. Mi ^ Mj. 
Therefore, pi{r\ui) and pj{r\uj) are not identical distri- 



i) Case-1: Modulations i and j represent two different 
PSK modulations, i.e., A'^ £ Sp' = {e^'^^™/*^' |m = 
I,..., Mi}, where Mi = 2'"% ki e N. First, suppose 
= 1. Then, under Hi, ( l64b becomes 




exp 



r 



1 + ^|/W|2 No 



1 ^ 1 ^^^,s 



(a) 



,|2 



(77) 



where (a) follows from = 1 and each symbol 

being equally likely, which implies that |/™^(*)|2 — 
1, Mm. We note that (ITTT i is independent of Hi. 
Therefore, p^{r\Hi) = p'^{r\Hj) which impies that 
D{pf'{r)\\pf{r)) = for iV = 1. Now suppose iV > 1. 
In order to show that D(p^(r)||p^(r)) > 0, it suffices 
to show that there exists an rg such that p^{TQ\Hi) ^ 
p'^{Yo\Hj). Let us set Tq = 1 (vector of ones) and write 
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k=l h>k 
Mi 
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No 
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No' mi = l 



fc=i ;i>fc 
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■ E 

mjv — 1 



exp 



2r 

Wo 



W N 



Nq+NT 



^ ^ COS (27r(TOft - mk)/Mi 



k=l h>k 



(78) 

where ??.{•} denotes the real part of a complex number 
We note that, for fixed > 1, ( iTST i cannot be reduced 
to a constant that is independent of M^, i.e.. Hi. In other 
words, for each Mj = 2''% /c^ G N, (|28]) will result in a 
different value. Therefore, p^{l\Hi) ^ p^{l\Hj) which 
implies that D{pf{r)\\pf{r)) > for TV > 1. 
ii) Case-2: Modulations i and j represent two QAM mod- 



ulations, i.e., /„ 



{6. 



TO 



1,...,M0. 



Using the above methodology used in Case-1, we can 
show that pf{0) ^ pfiO) for iV > 1 where denotes 
vector of zeros. Details are omitted for the sake of brevity. 
Therefore, D{pf{r)\\pfir)) > for TV > 1. 
iii) Case-3: Modulations i and j represent PSK and QAM 
modulations, respectively. In this case, similar to the 
above, we can show that pf{0) ^ pf{0) for > 1. 
Details are omitted for the sake of brevity. Therefore, 
D{pf{r)\\pf{r)) > for iV > 1. ■ 
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Fig. 4. ALRT with N = 4 observations. Pe versus number of sensors (L) 
under two different SNR regimes: dB and 6 dB. 



16-PSKvs. 16-QAM 




Fig. 5. Pe with the fusion rule in (66) using dependent vs. independent 
observations (16-PSK vs. 16-QAM). 
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- Dependent Observations 
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Fig. 6. Pe for the fusion rule in (66) using dependent vs. independent 
observations (64-QAM vs. 128-QAM vs. 256-QAM). 
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